Download High-Performance Real-Time FIR-Filtering Using Fast Convolution on Graphics Hardware
In this paper we examine how graphic hardware can be used for real-time FIR filtering. We implement uniformly-partitioned fast convolution in the frequency-domain and evaluate its performance on a NVIDIA GTX 285 graphics card. Motivated by audio rendering for virtual reality, our focus lies on large-scale realtime filtering with a multitude of channels, long impulse responses and low latencies. Graphics hardware has already been used for audio signal processing — including FIR and IIR filtering with respect to offline and real-time processing. However, the combination of GPU computing and real-time conditions leads to a number of challenges that have not been reviewed in detail. The new contribution of this paper is an implementation and detailled analysis of a frequency-domain fast convolution method on GPUs. We discuss specific problems that emerge under real-time conditions. Our method allows to achieve an outstanding real-time filtering performance. In this work, we do not only regard a timeinvariant filtering, but also time-varying filtering, where filters are exchanged during runtime. Furthermore, we examine the opportunities of distributed computation — using CPU and GPU — in order to maximize the performance. Finally, we identify bottlenecks and explain their impact on filter exchange latencies and update rates.
Download Optimal Filter Partitions for Real-Time FIR Filtering using Uniformly-Partitioned FFT-based Convolution in the Frequency-Domain
This paper concerns highly-efficient real-time FIR filtering with low input-to-output latencies. For this type of application, partitioned frequency-domain convolution algorithms are established methods, combining efficiency and the necessity of low latencies. Frequency-domain convolution realizes linear FIR filtering by means of circular convolution. Therefore, the frequency transform’s period must be allocated with input samples and filter coefficients, affecting the filter partitioning as can be found in many publications, is a transform size K=2B of two times the audio streaming block length B. In this publication we review this choice based on a generalized FFT-based fast convolution algorithm with uniform filter partitioning. The correspondence between FFT sizes, filter partitions and the resulting computational costs is examined. We present an optimization technique to determine the best FFT size. The resulting costs for stream filtering and filter transformations are discussed in detail. It is shown, that for real-time FIR filtering it is always beneficial to partition filters. Our results prove evidence that K=2B is a good choice, but they also show that an optimal FFT size can achieve a significant speedup for long filters and low latencies. Keywords: Real-time filtering, Fast convolution, Partitioned convolution, Optimal filter partitioning